Crime Pattern Analysis in Los Angeles (2020–2024)¶

INTRODUCTION¶

Crime poses significant challenges to society, impacting community reputation, individual well-being, and overall economic growth. It affects public safety, urban development, and investment potential, making crime analysis an essential component of urban management and law enforcement strategies.

According to the 2020 U.S. Census, Los Angeles (LA), California, is the second most populous city in the United States, with a population of 3,898,747 (Editor & Tikkanen, 2024). The city's socioeconomic diversity, rapid urbanization, and presence of informal settlements contribute to crime occurrence and escalation. LA has experienced various property crimes (burglary, theft, shoplifting) and violent crimes (assault, homicide, rape, lynching), which negatively impact public perception, tourism, trade, and economic stability.

This study aims to conduct a comprehensive spatiotemporal analysis of crime trends in Los Angeles from 2020 to 2024, leveraging data-driven methodologies to extract meaningful insights. The study follows a structured workflow, incorporating:

  1. Data Preprocessing: Cleaning and structuring raw crime datasets for analysis.
  2. Descriptive Statistical Analysis: Categorizing and summarizing crime data to identify trends and distributions.
  3. Temporal Analysis: Conducting time-series analysis (2020-2024) to examine seasonal and annual crime patterns.
  4. Predictive Modeling: Utilizing the Prophet forecasting model to predict future crime rates.
  5. Spatial Analysis: Implementing heatmaps and geospatial clustering to identify crime hotspots.

1. Data Preprocessing¶

In [35]:
# Installing and Importing the required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objs as go

!pip install prophet
from prophet import Prophet

!pip install folium
from folium.plugins import HeatMap
from folium import features
Defaulting to user installation because normal site-packages is not writeable
Looking in links: /usr/share/pip-wheels
Requirement already satisfied: prophet in /home/f7e5220a-ce9c-48fa-9cf8-61771bf56fe3/.local/lib/python3.10/site-packages (1.1.6)
Requirement already satisfied: cmdstanpy>=1.0.4 in /home/f7e5220a-ce9c-48fa-9cf8-61771bf56fe3/.local/lib/python3.10/site-packages (from prophet) (1.2.4)
Requirement already satisfied: numpy>=1.15.4 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from prophet) (1.26.4)
Requirement already satisfied: matplotlib>=2.0.0 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from prophet) (3.8.0)
Requirement already satisfied: pandas>=1.0.4 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from prophet) (2.1.4)
Requirement already satisfied: holidays<1,>=0.25 in /home/f7e5220a-ce9c-48fa-9cf8-61771bf56fe3/.local/lib/python3.10/site-packages (from prophet) (0.61)
Requirement already satisfied: tqdm>=4.36.1 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from prophet) (4.65.0)
Requirement already satisfied: importlib-resources in /home/f7e5220a-ce9c-48fa-9cf8-61771bf56fe3/.local/lib/python3.10/site-packages (from prophet) (6.4.5)
Requirement already satisfied: stanio<2.0.0,>=0.4.0 in /home/f7e5220a-ce9c-48fa-9cf8-61771bf56fe3/.local/lib/python3.10/site-packages (from cmdstanpy>=1.0.4->prophet) (0.5.1)
Requirement already satisfied: python-dateutil in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from holidays<1,>=0.25->prophet) (2.8.2)
Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from matplotlib>=2.0.0->prophet) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from matplotlib>=2.0.0->prophet) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from matplotlib>=2.0.0->prophet) (4.25.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from matplotlib>=2.0.0->prophet) (1.4.4)
Requirement already satisfied: packaging>=20.0 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from matplotlib>=2.0.0->prophet) (23.2)
Requirement already satisfied: pillow>=6.2.0 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from matplotlib>=2.0.0->prophet) (10.2.0)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from matplotlib>=2.0.0->prophet) (3.0.9)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from pandas>=1.0.4->prophet) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from pandas>=1.0.4->prophet) (2023.3)
Requirement already satisfied: six>=1.5 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from python-dateutil->holidays<1,>=0.25->prophet) (1.16.0)
Defaulting to user installation because normal site-packages is not writeable
Looking in links: /usr/share/pip-wheels
Requirement already satisfied: folium in /home/f7e5220a-ce9c-48fa-9cf8-61771bf56fe3/.local/lib/python3.10/site-packages (0.18.0)
Requirement already satisfied: branca>=0.6.0 in /home/f7e5220a-ce9c-48fa-9cf8-61771bf56fe3/.local/lib/python3.10/site-packages (from folium) (0.8.0)
Requirement already satisfied: jinja2>=2.9 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from folium) (3.1.3)
Requirement already satisfied: numpy in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from folium) (1.26.4)
Requirement already satisfied: requests in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from folium) (2.31.0)
Requirement already satisfied: xyzservices in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from folium) (2022.9.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from jinja2>=2.9->folium) (2.1.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from requests->folium) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from requests->folium) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from requests->folium) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from requests->folium) (2024.2.2)
In [10]:
# Loading crime data
crime_data = pd.read_csv('Crime_Data_from_2020_to_Present.csv')
crime_data.describe()
Out[10]:
DR_NO TIME OCC AREA Rpt Dist No Part 1-2 Crm Cd Vict Age Premis Cd Weapon Used Cd Crm Cd 1 Crm Cd 2 Crm Cd 3 Crm Cd 4 LAT LON
count 9.786280e+05 978628.000000 978628.000000 978628.000000 978628.000000 978628.000000 978628.000000 978613.000000 325959.000000 978617.000000 68816.000000 2309.000000 64.00000 978628.000000 978628.000000
mean 2.196564e+08 1338.802627 10.702561 1116.686084 1.404785 500.810635 29.122904 306.181502 363.815372 500.564847 958.156344 984.192724 991.21875 33.995399 -118.081108
std 1.290395e+07 651.622947 6.107280 610.836054 0.490851 206.309796 21.961531 218.908131 123.673988 206.107451 110.251477 51.506344 27.06985 1.640056 5.684520
min 8.170000e+02 1.000000 1.000000 101.000000 1.000000 110.000000 -4.000000 101.000000 101.000000 110.000000 210.000000 310.000000 821.00000 0.000000 -118.667600
25% 2.106073e+08 900.000000 5.000000 589.000000 1.000000 331.000000 0.000000 101.000000 311.000000 331.000000 998.000000 998.000000 998.00000 34.014600 -118.430500
50% 2.208116e+08 1420.000000 11.000000 1141.000000 1.000000 442.000000 30.000000 203.000000 400.000000 442.000000 998.000000 998.000000 998.00000 34.058900 -118.322500
75% 2.309110e+08 1900.000000 16.000000 1617.000000 2.000000 626.000000 44.000000 501.000000 400.000000 626.000000 998.000000 998.000000 998.00000 34.164900 -118.273900
max 2.499253e+08 2359.000000 21.000000 2199.000000 2.000000 956.000000 120.000000 976.000000 516.000000 956.000000 999.000000 999.000000 999.00000 34.334300 0.000000
In [11]:
# Dropping irrelevant multiple columns
columns_to_drop = ['DR_NO', 'Date Rptd', 'TIME OCC', 'AREA', 'Rpt Dist No', 'Part 1-2', 'Crm Cd','Mocodes', 'Premis Cd', 'Weapon Used Cd', 'Weapon Desc', 'Status', 'Crm Cd 1', 'Crm Cd 2', 'Crm Cd 3', 'Crm Cd 4', 'Cross Street' ]
crime_data = crime_data.drop(columns=columns_to_drop)
crime_data
Out[11]:
DATE OCC AREA NAME Crm Cd Desc Vict Age Vict Sex Vict Descent Premis Desc Status Desc LOCATION LAT LON
0 03/01/2020 12:00:00 AM Wilshire VEHICLE - STOLEN 0 M O STREET Adult Arrest 1900 S LONGWOOD AV 34.0375 -118.3506
1 02/08/2020 12:00:00 AM Central BURGLARY FROM VEHICLE 47 M O BUS STOP/LAYOVER (ALSO QUERY 124) Invest Cont 1000 S FLOWER ST 34.0444 -118.2628
2 11/04/2020 12:00:00 AM Southwest BIKE - STOLEN 19 X X MULTI-UNIT DWELLING (APARTMENT, DUPLEX, ETC) Invest Cont 1400 W 37TH ST 34.0210 -118.3002
3 03/10/2020 12:00:00 AM Van Nuys SHOPLIFTING-GRAND THEFT ($950.01 & OVER) 19 M O CLOTHING STORE Invest Cont 14000 RIVERSIDE DR 34.1576 -118.4387
4 08/17/2020 12:00:00 AM Hollywood THEFT OF IDENTITY 28 M H SIDEWALK Invest Cont 1900 TRANSIENT 34.0944 -118.3277
... ... ... ... ... ... ... ... ... ... ... ...
978623 07/23/2024 12:00:00 AM Wilshire VEHICLE - STOLEN 0 NaN NaN STREET Invest Cont 4000 W 23RD ST 34.0362 -118.3284
978624 01/15/2024 12:00:00 AM Central VANDALISM - MISDEAMEANOR ($399 OR UNDER) 0 X X HOTEL Invest Cont 1300 W SUNSET BL 34.0685 -118.2460
978625 07/19/2024 12:00:00 AM Devonshire TRESPASSING 0 X X MTA - ORANGE LINE - CHATSWORTH Invest Cont 10000 OLD DEPOT PLAZA RD 34.2500 -118.5990
978626 04/24/2024 12:00:00 AM Southwest ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT 70 F W SIDEWALK Invest Cont FLOWER ST 34.0215 -118.2868
978627 08/12/2024 12:00:00 AM Van Nuys VEHICLE - STOLEN 0 NaN NaN PARKING LOT Invest Cont 6900 VESPER AV 34.1961 -118.4510

978628 rows × 11 columns

In [12]:
#Dropping the NaN values
crime_data = crime_data.dropna()
#Printing all the counts of NaN values
na_counts = crime_data.isna().sum()
print(na_counts)
DATE OCC        0
AREA NAME       0
Crm Cd Desc     0
Vict Age        0
Vict Sex        0
Vict Descent    0
Premis Desc     0
Status Desc     0
LOCATION        0
LAT             0
LON             0
dtype: int64

2. Descriptive Statistical Analysis¶

In [13]:
crime_data = crime_data.copy()

# Defining categories based on keywords
def categorize_crime(description):
    if any(keyword in description for keyword in ['ASSAULT', 'BATTERY']):
        return 'Assault/Battery'
    elif any(keyword in description for keyword in ['BURGLARY', 'THEFT', 'SHOPLIFTING', 'STOLEN']):
        return 'Theft/Burglary'
    elif any(keyword in description for keyword in ['ARSON']):
        return 'Arson'
    elif any(keyword in description for keyword in ['CHILD', 'PORNOGRAPHY']):
        return 'Child-related Crimes'
    elif any(keyword in description for keyword in ['ROBBERY']):
        return 'Robbery'
    elif any(keyword in description for keyword in ['FRAUD', 'EMBEZZLEMENT']):
        return 'Fraud/Financial Crimes'
    elif any(keyword in description for keyword in ['FIREARMS', 'WEAPONS', 'SHOTS']):
        return 'Firearm/Weapon Offense'
    else:
        return 'Other'

# Applying the categorization
crime_data.loc[:, 'Crime Category 1'] = crime_data['Crm Cd Desc'].apply(categorize_crime)

# Displaying the crime category
print(crime_data['Crime Category 1'].unique())
['Theft/Burglary' 'Assault/Battery' 'Other' 'Child-related Crimes'
 'Robbery' 'Fraud/Financial Crimes' 'Arson' 'Firearm/Weapon Offense']
In [14]:
# Counting the records for each crime category
crime_category_counts = crime_data['Crime Category 1'].value_counts()

# Plotting a bar chart
plt.figure(figsize=(10, 6))
bar_chart = crime_category_counts.plot(kind='bar', color='skyblue', edgecolor='black')

# Adding titles and labels
plt.title('Count of Records by Crime Category', fontsize=16)
plt.xlabel('Crime Category', fontsize=14)
plt.ylabel('Record Count', fontsize=14)
plt.xticks(rotation=45, ha='right', fontsize=12) 

for index, value in enumerate(crime_category_counts):
    plt.text(index, value + 5, str(value), ha='center', fontsize=10, color='black')

plt.tight_layout()  
plt.show()
In [15]:
# Categorizing 'Vict Age' into 10-year intervals
age_bins = list(range(0, 101, 10))  
labels = [f'{i}-{i+9}' for i in age_bins[:-1]]  
crime_data['Age Category'] = pd.cut(crime_data['Vict Age'], bins=age_bins, labels=labels, right=False)

# Counting the number of crimes per age category
age_category_count = crime_data['Age Category'].value_counts().sort_index()

# Ploting the results
plt.figure(figsize=(10, 6))
age_category_count.plot(kind='bar', color='skyblue')
plt.title('Number of victims by Age Category')
plt.xlabel('Age Category')
plt.ylabel('Crime Count')
plt.xticks(rotation=45)

for index, value in enumerate(age_category_count):
    plt.text(index, value + 5, str(value), ha='center', fontsize=10, color='black')
plt.show()
In [16]:
# Counting the number of crimes for each victim sex
victim_sex_count = crime_data['Vict Sex'].value_counts()

# Plotting the results
plt.figure(figsize=(8, 5))
victim_sex_count.plot(kind='bar', color='skyblue')
plt.title('Number of Crimes Vs Victim Sex')
plt.xlabel('Victim Sex')
plt.ylabel('Crime Count')
plt.xticks(rotation=0)  
for index, value in enumerate(victim_sex_count):
    plt.text(index, value + 5, str(value), ha='center', fontsize=10, color='black')

plt.show()
In [17]:
# Grouping the data by day of the week and count the number of incidents on each day
crime_data['DATE OCC'] = pd.to_datetime(crime_data['DATE OCC'], format='%m/%d/%Y %I:%M:%S %p', errors='coerce')
crime_data['Day_of_Week'] = crime_data['DATE OCC'].dt.day_name()
# Count the records for each crime category
crime_category_counts = crime_data['Day_of_Week'].value_counts()

# Plotting a bar chart
plt.figure(figsize=(10, 6))
bar_chart = crime_category_counts.plot(kind='bar', color='skyblue', edgecolor='black')

# Adding titles and labels
plt.title('Crimes by Day of the Week', fontsize=16)
plt.xlabel('Day of Week', fontsize=14)
plt.ylabel('Number of Incidents', fontsize=14)
plt.xticks(rotation=45, ha='right', fontsize=12) 

for index, value in enumerate(crime_category_counts):
    plt.text(index, value + 5, str(value), ha='center', fontsize=10, color='black')

plt.tight_layout()  
plt.show()

3. Temporal Analysis¶

In [18]:
# Converting date column to datetime format

crime_data['Year'] = crime_data['DATE OCC'].dt.year
crime_data['Year-Month'] = crime_data['DATE OCC'].dt.to_period('M')

# Grouping by 'Year-Month' and count the incidents for each period
monthly_crime_counts = crime_data.groupby('Year-Month').size()

# Plotting the number of crime incidents by Year-Month
plt.figure(figsize=(12, 6))
monthly_crime_counts.plot(kind='line', marker='o', color='skyblue', linewidth=2)

# Adding labels and title
plt.xlabel('Year-Month')
plt.ylabel('Number of Incidents')
plt.title('Number of Crime Incidents by Year-Month')

# Customizing the plot
plt.grid(True)
plt.xticks(rotation=45) 
plt.tight_layout()  

# Displaying the plot
plt.show()

4.Predictive Modeling¶

In [29]:
#Grouping data by year-month to get monthly crime counts 
#and defining year_month as 'ds' and Crime_count as 'y'
monthly_crime_count = crime_data.groupby('Year-Month').size().reset_index(name='Count')

monthly_crime_count['Year-Month'] = pd.to_datetime(monthly_crime_count['Year-Month'].astype(str))
monthly_crime_count.columns = ['ds', 'y']
monthly_crime_count.head()
Out[29]:
ds y
0 2020-01-01 16725
1 2020-02-01 15586
2 2020-03-01 14309
3 2020-04-01 13505
4 2020-05-01 14915
In [30]:
# Initializing and fit the Prophet model
model = Prophet()
model.fit(monthly_crime_count)

# Creating a DataFrame for future predictions (remaining months of 2024)
future = model.make_future_dataframe(periods=4, freq='M')
forecast = model.predict(future)

# Displaying the forecast results
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(13)
05:59:10 - cmdstanpy - INFO - Chain [1] start processing
05:59:11 - cmdstanpy - INFO - Chain [1] done processing
Out[30]:
ds yhat yhat_lower yhat_upper
48 2024-01-01 14885.576198 10795.613319 18773.847838
49 2024-02-01 13425.375931 9721.432351 17479.876831
50 2024-03-01 13911.225252 9979.239810 17942.995117
51 2024-04-01 12776.488784 8773.805889 16715.204070
52 2024-05-01 12728.544858 9035.595705 16688.040021
53 2024-06-01 12612.610423 8822.706444 16479.590835
54 2024-07-01 13020.227321 9299.558642 16832.479950
55 2024-08-01 12634.156872 8841.442804 16464.512741
56 2024-09-01 10627.285262 6695.002871 14799.672453
57 2024-09-30 12226.569253 8299.856286 15939.147945
58 2024-10-31 13068.577006 9148.468207 16879.194605
59 2024-11-30 15504.770064 11553.225510 19377.436226
60 2024-12-31 14322.701267 10018.081385 18105.088780
In [31]:
# Plotting traces for actual and forecasted data
trace_actual = go.Scatter(
    x=monthly_crime_count['ds'],
    y=monthly_crime_count['y'],
    mode='lines+markers',
    name='Actual Crime Count',
    marker=dict(color='blue'),
    line=dict(color='blue', width=2)
)

trace_forecast = go.Scatter(
    x=forecast['ds'],
    y=forecast['yhat'],
    mode='lines',
    name='Forecasted Crime Count',
    marker=dict(color='red'),
    line=dict(color='red', width=2)
)

trace_upper = go.Scatter(
    x=forecast['ds'],
    y=forecast['yhat_upper'],
    mode='lines',
    name='Forecast Upper Bound',
    line=dict(color='grey', dash='dash'),
    showlegend=False
)

trace_lower = go.Scatter(
    x=forecast['ds'],
    y=forecast['yhat_lower'],
    mode='lines',
    name='Forecast Lower Bound',
    line=dict(color='grey', dash='dash'),
    fill='tonexty',  # Fill between the upper and lower bounds
    fillcolor='rgba(128, 128, 128, 0.2)',
    showlegend=False
)

# Combining the traces into a figure
data = [trace_actual, trace_forecast, trace_upper, trace_lower]

# Layout customization
layout = go.Layout(
    title='Actual vs Forecasted Monthly Crime Count',
    xaxis=dict(title='Date'),
    yaxis=dict(title='Crime Count'),
    hovermode='closest'
)

# Creating the figure and plot
fig = go.Figure(data=data, layout=layout)
fig.show()

5. Spatial Analysis¶

In [36]:
# Checking if 'LAT' and 'LON' columns are numeric and adding them to a list
crime_data['LAT'] = pd.to_numeric(crime_data['LAT'], errors='coerce')
crime_data['LON'] = pd.to_numeric(crime_data['LON'], errors='coerce')
crime_locations = crime_data[['LAT', 'LON']].values.tolist()

# Defining map bounds for LA area
min_lat, max_lat = 33.5, 34.5  
min_lon, max_lon = -119.0, -117.5  
bounds = [[min_lat, min_lon], [max_lat, max_lon]]

# Initializing a Folium map centered on LA with fixed zoom limits
map = folium.Map(
    location=[34.05, -118.25], 
    zoom_start=11,
    min_zoom=10,
    max_zoom=14,
    max_bounds=True  
)

# Adding the heat map layer
HeatMap(crime_locations, radius=10, blur=10, max_zoom=13, opacity=0.2).add_to(map)

# Adding a custom legend
legend_html = """
<div style="
    position: fixed;
    bottom: 50px;
    left: 50px;
    width: 200px;
    height: 120px;
    background-color: white;
    border:2px solid grey;
    z-index:9999;
    font-size:14px;
    padding: 10px;
    ">
    <b>Crime Density Legend</b><br>
    <i style="background: blue; width: 10px; height: 10px; display: inline-block;"></i> Low Density<br>
    <i style="background: green; width: 10px; height: 10px; display: inline-block;"></i> Medium Density<br>
    <i style="background: orange; width: 10px; height: 10px; display: inline-block;"></i> High Density<br>
    <i style="background: red; width: 10px; height: 10px; display: inline-block;"></i> Very High Density<br>
</div>
"""
map.get_root().html.add_child(features.Element(legend_html))

# Adding a custom title
title_html = """
<div style="
    position: fixed;
    top: 10px;
    left: 50%;
    transform: translateX(-50%);
    z-index: 1000;
    background-color: white;
    padding: 10px;
    font-size: 20px;
    font-weight: bold;
    border: 2px solid grey;
    border-radius: 5px;
">
    Crime Heatmap of Los Angeles
</div>
"""
map.get_root().html.add_child(features.Element(title_html))

# Displaying the map
map
Out[36]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Reference¶

Editor, W. D., & Tikkanen, A. (2024, September 3). List of the largest U.S. cities by population. Encyclopædia Britannica. https://www.britannica.com/topic/Whats-the-largest-US-city-by-population

In [ ]: